A Purposeful Selection of Variables Macro for Logistic Regression
نویسندگان
چکیده
The main problem in any model-building situation is to choose from a large set of covariates those that should be included in the “best” model. A decision to keep a variable in the model might be based on the clinical or statistical significance. There are several variable selection algorithms embedded in SAS PROC LOGISTIC. Those methods are mechanical and as such carry some limitations. Hosmer and Lemeshow describe a purposeful selection of covariates algorithm within which an analyst makes a variable selection decision at each step of the modeling process. In this paper we introduce a macro, %PurposefulSelection, which automates that process. The macro is based on the following algorithm: (1) fit a univariate model with each covariate, (2) select as candidates for a multivariate model those significant at some chosen alpha level, (3) identify those variables that are not significant in the multivariate model at some arbitrary alpha level, (4) fit a reduced model and evaluate confounding by change in parameter estimates, (5) repeat steps 3 and 4 until the model contains significant covariates and/or confounders and (6) add back in the model, one at a time, any variable not originally selected, keep any that are significant, and reduce the model following steps 3 and 4. At the end of step 6, the analyst will have a “main effects model.” Performance of the macro is illustrated with the application to the Hosmer and Lemeshow Worchester Heart Attack Study (WHAS) data.
منابع مشابه
Purposeful Selection of Variables in Logistic Regression: Macro and Simulation Results
The main problem in any model-building situation is to choose from a large set of covariates those that should be included in the best model. A decision to keep a variable in the model might be based on the clinical or statistical significance. There are several variable selection algorithms embedded in SAS PROC LOGISTIC. Those methods are mechanical and as such carry some limitations. Hosmer...
متن کاملAugmented Backward Elimination: A Pragmatic and Purposeful Way to Develop Statistical Models
Statistical models are simple mathematical rules derived from empirical data describing the association between an outcome and several explanatory variables. In a typical modeling situation statistical analysis often involves a large number of potential explanatory variables and frequently only partial subject-matter knowledge is available. Therefore, selecting the most suitable variables for a...
متن کاملA SAS Macro for Hosmer and Lemeshow’s Purposeful Selection Model Building Algorithm: Description and Performance
A common problem in many model-building situations is to choose from a large set of covariates that should be included in the “best” model. An additional consideration in modeling epidemiological data is the inclusion of confounders, which adds a quirk in the modeling procedure in that statistical significance is not the main criteria for keeping predictors in a model. Hosmer and Lemeshow (2000...
متن کاملEffects of Multicollinearity in All Possible Mixed Model Selection
The effects of multicollinearity in all possible model selection of fixed effects including quadratic and cross products in the presence of random and repeated measures effects are presented here. The user-friendly SAS macro application ALLMIXED2 complements the model selection option currently available in the SAS macro applications ‘REGDIAG’ and ‘LOGISTIC’ for multiple linear and logistic reg...
متن کاملCredit Risk Measurement of Trusted Customers Using Logistic Regression and Neural Networks
The issue of credit risk and deferred bank claims is one of the sensitive issues of banking industry, which can be considered as the main cause of bank failures. In recent years, the economic slowdown accompanied by inflation in Iran has led to an increase in deferred bank claims that could put the country's banking system in serious trouble. Accordingly, the current paper presents a prediction...
متن کامل